# A NOVEL PROPOSED PHYSICAL LAYER ENABLER OF NETWORK ON CHIP (NOC) WBCDMA SYSTEM FEATURES BY REDUCING SYSTEM COMPLEXITY

## <sup>1</sup>B.Swetha, <sup>2</sup>K.Srinu, <sup>3</sup>G.Shiva, <sup>4</sup>Mallela Pranava Sri

<sup>1,2,3</sup>Assistant Professor, <sup>4</sup>UG Student, <sup>1,2,3,4</sup>Dept. of Electronics and Communication Engineering, Visvesvaraya College of Engineering and Technology, Mangalpalle, Telangana, India.

## ABSTRACT

The physical layer enabler of Network-on-Chip (NoC) interconnects is proposed to be Code Division Multiple Access (CDMA), which has the advantages of fixed latency, assured service, and decreased system complexity. The NoC community has embraced CDMA interconnects since they were first used in wireless communications, where each bit of a CDMA-encoded data word is sent on a separate channel to prevent interference. On-chip interconnects, however, can effectively alleviate the wireless interference issue, negating the need to duplicate the CDMA channel. In addition, wireless channels are sequential by design, unlike on-chip interconnects where parallel buses are the standard form of communication. After CDMA was adopted by the NoC community, the same wireless CDMA scheme has been maintained where each data bit is encoded in a separate CDMA channel and the encoding/decoding logic is replicated for data packets. In this work, we present a novel CDMA encoding/decoding scheme called Aggregated CDMA (ACDMA) for NoC interconnects in which all packet bits are encoded in a single CDMA channel, consequently, eliminating the area and energy overheads resulted from replicating the channel encoding/decoding logic. The overhead of channel replication is mitigated which results in up to 60.5% area and 55% power savings with 124% improvement in throughput per area compared to the conventional CDMA crossbar. As a future work, we plan to build and evaluate a full ACDMA-based NoC under different workloads and routing protocols.

# **INTRODUCTION**

The spreading codes used in the previously suggested CDMA NoCs must possess both the orthogonal and balancing qualities since they are based on a digital encoding and decoding technique. The Walsh code is frequently applied to this goal. The Walsh-code-based (WB) encoding and decoding approach, however, has a number of intrinsic drawbacks. Design Complexity, first: In the encoding approach, coded data is combined using an arithmetic addition logic unit, whose logic cost rises with the number of senders. The decoding technique extracts the source data from mixed data chips using a key demux-accumulation-compare unit (in this brief, each bit of a spreading code is called a chip, and thus the encoded data is called data chips). But, this device requires a lot of space. 2) Low Code Utilization: In an S-chip Walsh code set, S must be equal to 2N, where N is a natural number, and at most S – 1 sequences can be used to encode the original data. This results in a waste of sequences in the code set. For example, a 16-node network needs a 32-chip Walsh code set, because a 16-chip Walsh code set can only provide 15 sequences for data encoding and it thus cannot satisfy the requirement of 16 sequences, one for each node.

#### LITERATURE SURVEY

Modern Systems-on-chips (SoCs) are becoming massively parallel with many harmoniously interconnected Processing Elements (PEs). Interconnecting the PEs is commonly achieved through buses and Networks-on-Chips (NoCs) [1]. In NoCs, exchanged data is bundled into packets and traverse several network layers passing by the physical layer which defines how packets are actually transmitted between NoC units. The physical layer of a NoC is implemented by routers employing crossbar switches. Code Division Multiple Access (CDMA) is a medium sharing technique that leverages orthogonal codes to enable simultaneous packet routing. Unlike timeshared channels, CDMA leverages the code space to enable channel sharing. CDMA has been proposed as an on-chip interconnect technique for both bus and NoC interconnect architectures [2]. Many advantages of using CDMA for on chip interconnects include reduced power consumption, fixed communication latency, and reduced system complexity [3]. Utilizing CDMA in NoC interconnects is adopted from the wireless communications literature, where the data is spread by orthogonal codes at the transmitters, the spread data are added on the wireless channel, and the received sum is decoded at the receivers. Classical CDMA systems rely on the Walsh orthogonal code family to enable medium sharing. Many research groups have investigated several aspects of CDMA in NoCs, including our group which presented the Overloaded CDMA for on-chip Interconnects (OCI) [4] [5] [6]. A 14-node CDMAbased network has been developed in [7]. The network utilizes 7 Walsh codes and assignment of the Walsh codes to the network nodes is dynamic based on the request from each node. Two architectures have been introduced in [7]: a serial CDMA network where each data chip in the spreading code is sent in one clock cycle; and a parallel CDMA network where all data chips are sent in the same cycle. The serial and parallel CDMA-based networks have been compared to a conventional CDMA network, a mesh based NoC, and a Time Division Multiple Access (TDMA) bus. For the same network area, the throughput of the parallel CDMA network is higher than that of the mesh-based NoC and the TDMA bus due to the simultaneous medium access nature of CDMA. Standard-basis codes are proposed as a replacement to Walsh CDMA codes in [8]. Standard-basis codes resemble TDMA signaling because each code consists of only a single chip of one and the remaining chips are zeros. The TDMA codes' orthogonally enables them to replace the Walsh codes as spreading and dispreading CDMA codes, which reduces the complexity of the channel adder and decoder as the sum of TDMA codes is limited to zero orone per clock cycle.

## **EXISTING METHOD**

The conventional CDMA crossbar employed in the literature is depicted in Figure 1. The crossbar interconnects N transmit ports to N receive ports using N-chip length Walsh spreading codes. The binary data from each transmit port is encoded using an XOR encoder; the data bit is XORed with a unique N-chip spreading code assigned to the transmit-receive pair and transmitted in N clock cycles. Data spread from all encoders are added by the CDMA channel adder and sent to all receive port. The decoder at each receives port extracts the data from the channel sum by correlating the channel sum with the assigned spreading code. The correlation operation is implemented using an accumulator and a multiplexer since the dispreading code chips are unipolar ( $-0\parallel$  or  $-1\parallel$ ). In all of the CDMA channel and the encoding/decoding logic is replicated W times for data packets of width W which is a direct application of the wireless CDMA principles in NoC interconnects. However, wireless communication channels are sequential by nature due to the interference problem.



Figure 1. Conventional CDMA crossbar.

Multiple access and MIMO techniques can enable concurrent data transmission on the same wireless channel at the expense of increasing the transmitter/receiver complexity. In on-chip interconnects, on the other hand, a single channel can be efficiently utilized to enable parallel data transmission as noise and interference effects can be efficiently mitigated [9]. In this work, we present a single channel, multi-bit CDMA crossbar namely Aggregated CDMA (ACDMA) NoC crossbar.

#### **PROPOSED METHOD**

#### 4.1 NOC Crossbar Architecture

The ACDMA crossbar implements the psychical layer of the NoC by interconnecting N transmit (TX) ports to N receive ports where the data width of each port is W where W = log2 max(dj). The high-level architecture of the ACDMA crossbar illustrated in Figure 2(a) is composed of three main parts; encoders, channel adder, and decoders. The encoders spread data from each TX port using W XOR gates as shown in Figure 2(b). Instead of adding the spreading chips of the Walsh orthogonal code to the result in the encoder block as suggested by (2), this operation is postponed to the channel adder block in order to merge the channel adder with the spreading code adders. The output of each encoder is, therefore, limited to W-bit width. The encoder





Outputs are then added together to form the sum Si of (3). To minimize the critical path of the channel adder, the addition is done using a tree adder architecture as depicted by Figure 2(c) where the leafs of the tree are the encoders of each TX port, and the root of the tree is the channel sum output. Because there are N leafs, the height of the tree is  $log_2$  (N). The width of theoutput wires from each adder in the tree is equal to the width of the input wires plus one to prevent overflows. Since the input to the firs level of adders is (W +1)- bit wide and the height of the adder tree is  $log_2(N)$ , then the width of the output wires at the root adder is W +1+log\_2(N). Pipeline registers are inserted after each stage in the tree to minimize the critical path of the channel. The sum Si is then sent to all the N decoders, a decoder per RX port.



Figure 2. (c) ACDMA channel adder, (d)ACDMA decoder

The decoders implement the cross-correlation of (4) in a cost efficient manner; the decoder consists of only an adder/subtracted and a register configured as an up/down accumulator as shown in Figure 2(d). Since the dispreading code Ck consists of  $\pm 1$  chips, cross correlation is reduced to simple addition and subtraction operations of consequent sums Si . Therefore, the decoder is implemented as an up/down accumulator; the adder/subtracter adds or subtracts the sum Si from the result saved in the registers according to the value of despreading chip Ci k. In particular, when the despreading chip is \_1', the adder adds Si to the contents of the register but subtracts Si from the contents of the register when the despreading chip is \_-1'. At the end of the decoding cycle, the accumulator register holds N dk according to (5), and because N = 2n and nis an integer, data dk is decoded by shifting the accumulator content by log2(N) bits.

The number of two input XOR gates is the same for both circuits. The improvement of the ACDMA crossbar over the conventional CDMA crossbar is evident in the number of channel adder wires; in the conventional CDMA crossbar, the number of the adder wires for the single- bit channel is increased by one in each stage due to the additional carry bit. Therefore, the number of adder wires in stage i is equal to  $1+\log_2(N)-i$ . For a W-bit word, the number of adder wires is increased to  $W+W(\log_2(N)-i)$ , and since there are 2i adders at each stage, then the total number of wires is equal to  $\log_2N-1$  i=0 2i (W +W(log\_2(N)-i)). In the ACDMA crossbar, conversely, the number of adder wires for a W-bit word is W + $\log_2(N)-i$ , which makes the total number of wires equals to  $\log_2N-1$  i=0 2i (W + $\log_2N -i$ ) which is a factor of W less than that of the conventional CDMA crossbar. The reduced number of carry bits of the ACDMA crossbar is the prime reason for its superiority. The number of wires for the decoder accumulator and the number of flip-flops in the decoder registers is proportional to the number of channel wires—the last stage of the adder. This follows that the complexity of the ACDMA crossbar is in an order of W less than that of the conventional CDMA crossbar.

#### SIMULATION RESULTS

Fig 3 Design summary

|                           | Device Utilization Summary (estimated values) |           | Ŀ           |
|---------------------------|-----------------------------------------------|-----------|-------------|
| Logic Utilization         | Used                                          | Available | Utilization |
| Number of Sices           | 20                                            | 14752     | 0%          |
| Number of Sice Flip Flops | 19                                            | 29504     | 0%          |
| Number of 4 input LUTs    | 31                                            | 29504     | 0%          |
| Number of bonded 108s     | 16                                            | 250       | 5%          |
| Number of GQLKs           | 1                                             | 24        | 4%          |



Fig 4. RTL Schematic

| Name             | Value | 0 ns | 100 ns | 200 ns | 300 ns      | 1400 ns | 500 ns  600 |
|------------------|-------|------|--------|--------|-------------|---------|-------------|
| 🖌 🎼 DataOut[6:0] | х     | X    | 0      |        | 98 77       | 43 57 ) | 86          |
| 1 Clock          | 0     |      |        |        |             |         |             |
| 18 Reset         | 1     |      |        |        |             |         |             |
| Dataln[6:0]      | 0     |      | 0      | ( 98 ) | 77 43 57    | ×       | 86          |
| 🕨 🚮 WData[7:0]   | x     | X    | 0      | χ 200  | 231 43      | s7 χ    | 252         |
| AA[0:7]          | 0     |      | 0      | 70 (1  | 178 212 156 | X       | 106         |

Fig 5. Encoder output

| ns |
|----|
|    |
|    |
|    |

Fig 6. Decoder output

| Offset:                                              | 4.394ns          | (Levels c | of Logic | : = 1)                                                                    |  |  |
|------------------------------------------------------|------------------|-----------|----------|---------------------------------------------------------------------------|--|--|
| Source:                                              | DE/DataOu        | it_6 (FF) |          |                                                                           |  |  |
| Destination:                                         | DataOut<6> (PAD) |           |          |                                                                           |  |  |
| Source Clock:                                        | Clock rising     |           |          |                                                                           |  |  |
| Cell:in->out                                         | fanout           | Delay     | Delay    | Logical Name (Net Name)                                                   |  |  |
|                                                      |                  | A         | 0 531    | DE/DataOut 6 (DE/DataOut                                                  |  |  |
| FDR:C->Q                                             | 3                | 0.591     | 0.331    | ppt pacagare_o (ppt pacagare_                                             |  |  |
| FDR:C->Q<br>OBUF:I->O                                | 3                | 3.272     | 0.331    | DataOut_6_OBUF (DataOut<6                                                 |  |  |
| 12 9 9 6 P 9 5 6 6 7 6 6 6 7 6 7 6 7 6 7 6 7 6 7 6 7 | 3                | 3.272     |          | - 이상 수가 집 것 같은 것 같은 것 같은 것 같은 것 않는 것 같은 것 같 |  |  |

Fig 7. Encoder output

## CONCLUSION

In this work, we presented the ACDMA NoC crossbar to enable parallel transmission of multi- bit data packets on a single CDMA channel. The overhead of channel replication is mitigated which results in up to 60.5% area and 55% power savings with 124% improvement in throughput per area compared to the conventional CDMA crossbar. As a future work, we plan to build and evaluate a full ACDMA-based NoC under different workloads and routing protocols. In this work, we present a novel CDMA encoding/decoding scheme called Aggregated CDMA (ACDMA) for NoC interconnects in which all packet bits are encoded in a single CDMA channel, consequently, eliminating the area and energy overheads resulted from replicating the channel encoding/decoding logic.

#### REFERENCES

- L. Wang, J. Hao, and F. Wang.Bus-based and NoC infrastructure performance emulation and comparison. In Information Technology: New Generations, 2009. ITNG '09. Sixth International Conference on, pages 855–858, April 2009.
- R. H. Bell, Chang Yong Kang, L. John, and E. E. Swartzlander. CDMA as a multiprocessor interconnect strategy. In Signals, Systems and Computers, 2001. Conference Record of the Thirty-Fifth Asilomar Conference on, volume 2, pages 1246–1250 vol.2, Nov 2001.
- B. C. C. Lai, P. Schaumont, and I. Verbauwhede. CT-bus: a heterogeneous CDMA/TDMA bus for future SOC. In Signals, Systems and Computers, 2004. Conference Record of the Thirty-Eighth Asilomar Conference on, volume 2, pages 1868–1872 Vol.2, Nov 2004.
- K. E. Ahmed and M. M. Farag. Overloaded CDMA bus topology for MPSoC interconnect. In 2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14), pages 1–7, Dec 2014.
- K. E. Ahmed and M. M. Farag. Enhanced overloaded CDMA interconnect (OCI) busarchitecture for on-chip communication. In 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, pages 78–87, Aug 2015.
- 6. Dinan EH, Jabbari B (1998) Spreading codes for direct sequence CDMA and wideband CDMA

cellular networks. IEEE Commun Mag 48–54

- 7. Ahmed KE, Rizk MR, Farag MM (2017) Overloaded CDMA crossbar for network-on-chip.
- 8. IEEE Trans Very Large Scale Integr (VLSI) Syst 25(6), June 2017
- Wang J, Lu Z, Li Y (2015) A new CDMA encoding/decoding method for on-chip communication network. IEEE Trans Very Large Scale Integr (VLSI) Syst 1063–8210 ©2015 (IEEE)
- Ahmed KE, Rizk MR, Farag MM (2016) Aggregated CDMA crossbar for network-on-chip, 978-1-5090-5721, IEEE
- 11. Karthikeyan A, Senthil Kumar P (2017) GALS implementation of randomly prioritized bufferless routing architecture for 3D NoC. Springer, 20 June 2017
- Latha HN, Palachandra MV, Raoc M (2012) Real time implementation and performance evaluation of WCDMA system over AWGN channel on TMS320C6713DSK. ProcediaTechnol 4:82–86
- 13. Parthibun P, Prabu T (2013) FPGA implementation of DSSS-wideband WCDMA transmitter and receiver using QPSK. Int J VLSI Embed Syst (IJVES) 04, Article 09155